Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR
نویسندگان
چکیده
Conventional acoustic models, such as Gaussian mixture models (GMM) or deep neural networks (DNN), cannot be reliably estimated when there are very little speech training data, e.g. less than 1 hour. In this paper, we investigate the use of a non-parametric kernel density estimation method to predict the emission probability of HMM states. In addition, we introduce a discriminative score calibrator to improve the speech class posteriors generated by the kernel density for speech recognition task. Experimental results on the Wall Street Journal task show that the proposed acoustic model using cross-lingual bottleneck features significantly outperforms GMM and DNN models for limited training data case.
منابع مشابه
Cross-lingual and multi-stream posterior features for low resource LVCSR systems
We investigate approaches for large vocabulary continuous speech recognition (LVCSR) system for new languages or new domains using limited amounts of transcribed training data. In these low resource conditions, the performance of conventional LVCSR systems degrade significantly. We propose to train low resource LVCSR system with additional sources of information like annotated data from other l...
متن کاملCross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
Recently there has been some interest in the question of how to build LVCSR systems for the low-resource languages. The scenario we focus on here is having only one hour of acoustic training data in the “target” language, but more plentiful data in other languages. This paper presents approaches using MLP based features: we construct a low-resource system with additional sources of information ...
متن کاملContext-dependent phone mapping for LVCSR of under-resourced languages
This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. like...
متن کاملRecent improvements in neural network acoustic modeling for LVCSR in low resource languages
In this paper we focus on several techniques that improve deep neural network (DNN) acoustic modeling for low-resource languages. We explore the use of different features such as, fundamental-frequency variation (FFV), tonal features, and normalization of these features for deep neural network training. Specifically we study the impact of these features in conjunction with a tonal lexicon and s...
متن کاملCross Lingual Modelling Experiments for Indonesian
The extension of Large Vocabulary Continuous Speech Recognition (LVCSR) to resource poor languages such as Indonesian is hindered by the lack of transcribed acoustic data and appropriate pronunciation lexicons. Research has generally been directed toward establishing robust cross-lingual acoustic models, with the assumption that phonetic lexicons are readily available. This is not the case for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014